BenchmarkDotNetでDisquuunをチューニングした時のメモ
概要
とりあえず適当に。
デフォ
BenchmarkDotNet=v0.10.11, OS=macOS 10.12.6 (16G1114) [Darwin 16.7.0]
Processor=Intel Core i7-7Y75 CPU 1.30GHz, ProcessorCount=4
.NET Core SDK=2.0.0
[Host] : .NET Core 2.0.0 (Framework 4.6.00001.0), 64bit RyuJIT
DefaultJob : .NET Core 2.0.0 (Framework 4.6.00001.0), 64bit RyuJIT
Method |
Mean |
Error |
StdDev |
Take10Connection |
74.96 us |
1.449 us |
1.355 us |
適当に数回動かす。
Method | Mean | Error | StdDev |
------------------------- |---------:|---------:|---------:|
Take_10byte_2sock_async | 68.04 us | 28.83 us | 1.629 us |
Take_10byte_10sock_async | 72.87 us | 22.36 us | 1.263 us |
Take_10byte_30sock_async | 73.63 us | 29.12 us | 1.645 us |
Method | Mean | Error | StdDev |
------------------------- |---------:|---------:|----------:|
Take_10byte_2sock_async | 74.89 us | 16.02 us | 0.9051 us |
Take_10byte_10sock_async | 68.70 us | 36.14 us | 2.0421 us |
Take_10byte_30sock_async | 70.65 us | 42.91 us | 2.4246 us |
2018/01/13 5:32:34
2018/01/13 5:32:40
Method | Mean | Error | StdDev |
------------------------- |---------:|----------:|----------:|
Take_10byte_2sock_async | 74.17 us | 1.4725 us | 2.1118 us |
Take_10byte_10sock_async | 74.60 us | 0.9152 us | 0.8113 us |
Take_10byte_30sock_async | 75.57 us | 0.8857 us | 0.8285 us |
フル版、若干高速化できた気がする。まだ気分だけ。
Method | Mean | Error | StdDev |
------------------------- |---------:|---------:|----------:|
Take_10byte_2sock_async | 74.52 us | 1.022 us | 0.7979 us |
Take_10byte_10sock_async | 75.39 us | 1.079 us | 1.0091 us |
Take_10byte_30sock_async | 78.41 us | 1.518 us | 2.2717 us |
Method | Mean | Error | StdDev |
---------------------------- |---------:|---------:|----------:|
Take_10byte_2sock_async | 141.5 us | 282.3 us | 15.949 us |
Take_10byte_10sock_async | 143.5 us | 235.4 us | 13.298 us |
Take_10byte_30sock_async | 167.5 us | 545.8 us | 30.840 us |
Take_10byte_2sock_pipeline | 138.9 us | 134.2 us | 7.584 us |
Take_10byte_10sock_pipeline | 140.2 us | 175.1 us | 9.892 us |
ubuntu関係なく動かして検証できそう。
Method | Mean | Error | StdDev |
------------------------ |---------:|----------:|----------:|
Take_10byte_2sock_sync | 52.78 us | 7.275 us | 0.4111 us |
tcpを扱うライブラリなので、基礎的なsyncはまあそのまま速度が計測できるんだけど、asyncの負荷を測るには、というのがまだ正しくない。
内部的な機構の速度を計測するようにしよう。
遅そうなところでいうと、複数のソケットを切り替えるあたり。
lockを外すことができたら結構変わるのでは感
参考
http://engineering.grani.jp/entry/2017/07/28/145035
hashtable良さそう。
Method | Mean | Error | StdDev |
---------------------------- |----------:|----------:|-----------:|
Take_10byte_2sock_async | 68.21 us | 10.81 us | 0.6107 us |
Take_10byte_2sock_sync | 64.78 us | 53.97 us | 3.0491 us |
Take_10byte_10sock_async | 94.92 us | 43.23 us | 2.4424 us |
Take_10byte_30sock_async | 109.56 us | 418.98 us | 23.6731 us |
Take_10byte_2sock_pipeline | 102.49 us | 83.99 us | 4.7454 us |
Take_10byte_10sock_pipeline | 76.44 us | 33.99 us | 1.9203 us |
Take_10byte_30sock_pipeline | 77.62 us | 91.14 us | 5.1495 us |
pipelineが割と高速な気がするのはいいこと。
とりあえずあと追加すべきなのは
・足りない場合のソケット追加アラート
stackに入った時間とstackしてたものが取り出される時間で、その時間の最長を残す? 失速していた時間、みたいな。
単位時間あたりの失速回数とかでもいいのかな、スレッド使うの嫌だから、発生タイミング関連でなんかできるといいのだが
Method | Mean | Error | StdDev |
---------------------------- |----------:|----------:|-----------:|
Take_10byte_2sock_async | 69.06 us | 23.87 us | 1.3487 us |
Take_10byte_2sock_sync | 56.41 us | 14.25 us | 0.8050 us |
Take_10byte_10sock_async | 69.36 us | 38.55 us | 2.1779 us |
Take_10byte_30sock_async | 72.14 us | 33.17 us | 1.8742 us |
Take_10byte_2sock_pipeline | 75.60 us | 29.17 us | 1.6484 us |
Take_10byte_10sock_pipeline | 77.12 us | 10.02 us | 0.5662 us |
Take_10byte_30sock_pipeline | 103.12 us | 196.04 us | 11.0764 us |
-> available(ソケットプールからの使用可能ソケットの抽選)の書き方変えられるかも。わざわざsocketまで読みにいかなくてもいい的な。
やってみた ->
Method | Mean | Error | StdDev |
---------------------------- |---------:|----------:|----------:|
Take_10byte_2sock_async | 81.98 us | 101.04 us | 5.709 us |
Take_10byte_2sock_sync | NA | NA | NA |
Take_10byte_10sock_async | 91.41 us | 157.85 us | 8.919 us |
Take_10byte_30sock_async | 95.51 us | 80.68 us | 4.558 us |
Take_10byte_2sock_pipeline | 99.40 us | 285.53 us | 16.133 us |
Take_10byte_10sock_pipeline | 85.39 us | 145.92 us | 8.245 us |
Take_10byte_30sock_pipeline | 76.18 us | 36.68 us | 2.072 us |
遅くなったw
やーめよ。
Method | Mean | Error | StdDev |
---------------------------- |---------:|----------:|----------:|
Take_10byte_2sock_async | 78.41 us | 44.05 us | 2.489 us |
Take_10byte_2sock_sync | 57.63 us | 28.17 us | 1.592 us |
Take_10byte_10sock_async | 76.53 us | 33.12 us | 1.871 us |
Take_10byte_30sock_async | 83.94 us | 28.41 us | 1.605 us |
Take_10byte_2sock_pipeline | 85.94 us | 97.83 us | 5.527 us |
Take_10byte_10sock_pipeline | 84.15 us | 233.62 us | 13.200 us |
Take_10byte_30sock_pipeline | 84.44 us | 18.17 us | 1.027 us |
パイプライン系に10倍のデータを流すベンチを追加。まとめて送る系の効果はデータ量に対してしっかりと出る。
Method | Mean | Error | StdDev |
---------------------------- |----------:|-----------:|-----------:|
Take_10byte_2sock_async | 81.15 us | 9.596 us | 0.5422 us |
Take_10byte_2sock_sync | 59.28 us | 5.387 us | 0.3044 us |
Take_10byte_10sock_async | 99.69 us | 223.154 us | 12.6086 us |
Take_10byte_30sock_async | 78.28 us | 36.978 us | 2.0893 us |
Take_10byte_2sock_pipeline | 152.23 us | 40.314 us | 2.2778 us |
Take_10byte_10sock_pipeline | 156.67 us | 80.243 us | 4.5339 us |
Take_10byte_30sock_pipeline | 184.29 us | 272.888 us | 15.4187 us |
conoha上のubuntu 4コアで実行したところ、こんな感じ。
Method | Mean | Error | StdDev |
-------------------------------- |----------:|-----------:|----------:|
Take_10byte_2sock_async | 111.04 us | 131.1 us | 7.406 us |
Take_10byte_2sock_sync | 80.81 us | 108.5 us | 6.132 us |
Take_10byte_10sock_async | 130.34 us | 304.1 us | 17.181 us |
Take_10byte_30sock_async | 116.58 us | 310.2 us | 17.529 us |
Take_10byte_2sock_sync_10item | 895.81 us | 1,387.7 us | 78.405 us |
Take_10byte_10sock_async_10item | 343.30 us | 582.9 us | 32.935 us |
Take_10byte_30sock_async_10item | 373.80 us | 321.7 us | 18.178 us |
Take_10byte_2sock_pipeline | 131.05 us | 168.6 us | 9.528 us |
Take_10byte_10sock_pipeline | 129.62 us | 244.6 us | 13.823 us |
Take_10byte_30sock_pipeline | 134.58 us | 187.4 us | 10.586 us |
Take_10byte_2sock_pipeline_10 | 299.17 us | 413.9 us | 23.386 us |
Take_10byte_10sock_pipeline_10 | 273.35 us | 382.1 us | 21.588 us |
Take_10byte_30sock_pipeline_10 | 298.77 us | 276.1 us | 15.601 us |
こんな感じか。
async系は軒並みWaitHandleのペナルティを負ってるんで、それでこのスコアならいい感じなのでは。
ここで、tcpソケットを実用しない、ロジックのみの実装版が完成。
ほぼ何もしないソケットだと動作速度はこんな感じ。
Method | Mean | Error | StdDev |
-------------------------------- |----------:|-----------:|----------:|
Take_10byte_2sock_async | 4.018 us | 1.2750 us | 0.0720 us |
Take_10byte_2sock_sync | 1.500 us | 0.2568 us | 0.0145 us |
Take_10byte_10sock_async | 4.180 us | 0.6578 us | 0.0372 us |
Take_10byte_30sock_async | 4.136 us | 0.6864 us | 0.0388 us |
Take_10byte_2sock_sync_10item | 14.721 us | 3.6871 us | 0.2083 us |
Take_10byte_10sock_async_10item | 17.819 us | 4.7120 us | 0.2662 us |
Take_10byte_30sock_async_10item | 18.440 us | 12.3032 us | 0.6952 us |
さらにloopとかを加えられるようにした。
うーんベースクラス用意するとベンチが覿面に落ちるのなんで。interfaceにしてみるか。
Method | Mean | Error | StdDev |
-------------------------------- |----------:|------------:|-----------:|
Take_10byte_2sock_async | 4.263 us | 1.6034 us | 0.0906 us |
Take_10byte_2sock_sync | 1.572 us | 1.3900 us | 0.0785 us |
Take_10byte_10sock_async | 4.242 us | 1.0398 us | 0.0588 us |
Take_10byte_30sock_async | 4.183 us | 0.6072 us | 0.0343 us |
Take_10byte_2sock_sync_10item | 15.413 us | 3.4801 us | 0.1966 us |
Take_10byte_10sock_async_10item | 18.348 us | 11.8447 us | 0.6692 us |
Take_10byte_30sock_async_10item | 18.033 us | 0.7730 us | 0.0437 us |
Take_10byte_2sock_pipeline | 4.753 us | 0.7805 us | 0.0441 us |
Take_10byte_10sock_pipeline | 4.670 us | 0.8726 us | 0.0493 us |
Take_10byte_30sock_pipeline | 4.529 us | 0.5373 us | 0.0304 us |
Take_10byte_2sock_pipeline_10 | 25.391 us | 5.5216 us | 0.3120 us |
Take_10byte_10sock_pipeline_10 | 24.981 us | 5.8046 us | 0.3280 us |
Take_10byte_30sock_pipeline_10 | 23.549 us | 7.6720 us | 0.4335 us |
Take_10byte_2sock_loop_2 | 33.197 us | 11.0151 us | 0.6224 us |
Take_10byte_10sock_loop_2 | 55.504 us | 398.8342 us | 22.5349 us |
Take_10byte_30sock_loop_2 | 43.701 us | 391.8285 us | 22.1390 us |
さらに落ちた。
まあ、メンテナンス性考えたらこんな感じかな~と思いつつ、遅い。
Method | Mean | Error | StdDev |
-------------------------------- |----------:|----------:|----------:|
Take_10byte_2sock_async | 5.246 us | 9.635 us | 0.5444 us |
Take_10byte_2sock_sync | 1.737 us | 1.059 us | 0.0598 us |
Take_10byte_10sock_async | 4.994 us | 5.007 us | 0.2829 us |
Take_10byte_30sock_async | 5.447 us | 6.660 us | 0.3763 us |
Take_10byte_2sock_sync_10item | 16.707 us | 5.224 us | 0.2952 us |
Take_10byte_10sock_async_10item | 18.547 us | 9.683 us | 0.5471 us |
Take_10byte_30sock_async_10item | 18.483 us | 7.964 us | 0.4500 us |
Take_10byte_2sock_pipeline | 5.722 us | 6.309 us | 0.3565 us |
Take_10byte_10sock_pipeline | 4.856 us | 4.104 us | 0.2319 us |
Take_10byte_30sock_pipeline | 5.816 us | 3.047 us | 0.1721 us |
Take_10byte_2sock_pipeline_10 | 29.869 us | 42.256 us | 2.3876 us |
Take_10byte_10sock_pipeline_10 | 25.101 us | 3.782 us | 0.2137 us |
Take_10byte_30sock_pipeline_10 | 24.649 us | 3.549 us | 0.2005 us |
戻した。
Method | Mean | Error | StdDev |
-------------------------------- |----------:|-----------:|----------:|
Take_10byte_2sock_async | 4.065 us | 1.7354 us | 0.0981 us |
Take_10byte_2sock_sync | 1.537 us | 0.9427 us | 0.0533 us |
Take_10byte_10sock_async | 5.889 us | 8.2307 us | 0.4651 us |
Take_10byte_30sock_async | 5.515 us | 2.4331 us | 0.1375 us |
Take_10byte_2sock_sync_10item | 17.907 us | 28.7123 us | 1.6223 us |
Take_10byte_10sock_async_10item | 22.929 us | 53.0855 us | 2.9994 us |
Take_10byte_30sock_async_10item | 21.841 us | 3.7723 us | 0.2131 us |
Take_10byte_2sock_pipeline | 4.795 us | 0.9523 us | 0.0538 us |
Take_10byte_10sock_pipeline | 4.656 us | 0.5813 us | 0.0328 us |
Take_10byte_30sock_pipeline | 4.735 us | 0.5510 us | 0.0311 us |
Take_10byte_2sock_pipeline_10 | 22.294 us | 6.0993 us | 0.3446 us |
Take_10byte_10sock_pipeline_10 | 23.485 us | 22.0893 us | 1.2481 us |
Take_10byte_30sock_pipeline_10 | 21.890 us | 7.6632 us | 0.4330 us |
Take_10byte_2sock_loop_2 | 29.881 us | 1.8279 us | 0.1033 us |
Take_10byte_10sock_loop_2 | 31.114 us | 15.8645 us | 0.8964 us |
Take_10byte_30sock_loop_2 | 33.355 us | 21.4925 us | 1.2144 us |
で、最終、Pressure検知機構を付け加えたバージョン。
Method | Mean | Error | StdDev |
-------------------------------- |----------:|-----------:|----------:|
Take_10byte_2sock_async | 4.040 us | 0.7995 us | 0.0452 us |
Take_10byte_2sock_sync | 1.549 us | 0.2411 us | 0.0136 us |
Take_10byte_10sock_async | 4.290 us | 0.9721 us | 0.0549 us |
Take_10byte_30sock_async | 4.220 us | 1.3237 us | 0.0748 us |
Take_10byte_2sock_sync_10item | 16.081 us | 16.7491 us | 0.9464 us |
Take_10byte_10sock_async_10item | 20.856 us | 18.8980 us | 1.0678 us |
Take_10byte_30sock_async_10item | 18.623 us | 6.1906 us | 0.3498 us |
Take_10byte_2sock_pipeline | 5.197 us | 11.6224 us | 0.6567 us |
Take_10byte_10sock_pipeline | 5.598 us | 3.1213 us | 0.1764 us |
Take_10byte_30sock_pipeline | 5.065 us | 0.7191 us | 0.0406 us |
Take_10byte_2sock_pipeline_10 | 22.567 us | 12.7912 us | 0.7227 us |
Take_10byte_10sock_pipeline_10 | 25.333 us | 6.0615 us | 0.3425 us |
Take_10byte_30sock_pipeline_10 | 23.269 us | 11.4749 us | 0.6484 us |
Take_10byte_2sock_loop_2 | 32.199 us | 16.3175 us | 0.9220 us |
Take_10byte_10sock_loop_2 | 35.051 us | 4.2557 us | 0.2405 us |
Take_10byte_30sock_loop_2 | 34.979 us | 2.1891 us | 0.1237 us |
3.5用API
訳あってcore2ではなく3.5でコンパイル通って欲しいバージョンで再度ロジックのみベンチ。3.5ブランチとして吐き出そう。
Method | Mean | Error | StdDev |
-------------------------------- |----------:|------------:|-----------:|
Take_10byte_2sock_async | 9.446 us | 42.6027 us | 2.4071 us |
Take_10byte_2sock_sync | 1.847 us | 0.3184 us | 0.0180 us |
Take_10byte_10sock_async | 5.389 us | 3.2670 us | 0.1846 us |
Take_10byte_30sock_async | 5.558 us | 1.9537 us | 0.1104 us |
Take_10byte_2sock_sync_10item | 18.929 us | 27.2548 us | 1.5399 us |
Take_10byte_10sock_async_10item | 22.569 us | 5.9343 us | 0.3353 us |
Take_10byte_30sock_async_10item | 19.399 us | 3.3334 us | 0.1883 us |
Take_10byte_2sock_pipeline | 5.485 us | 0.6061 us | 0.0342 us |
Take_10byte_10sock_pipeline | 5.077 us | 1.6905 us | 0.0955 us |
Take_10byte_30sock_pipeline | 6.545 us | 5.5997 us | 0.3164 us |
Take_10byte_2sock_pipeline_10 | 31.838 us | 28.0285 us | 1.5837 us |
Take_10byte_10sock_pipeline_10 | 28.984 us | 7.3191 us | 0.4135 us |
Take_10byte_30sock_pipeline_10 | 26.362 us | 4.1309 us | 0.2334 us |
Take_10byte_2sock_loop_2 | 49.594 us | 440.7760 us | 24.9047 us |
Take_10byte_10sock_loop_2 | 43.963 us | 62.6907 us | 3.5421 us |
Take_10byte_30sock_loop_2 | 66.129 us | 55.2169 us | 3.1199 us |
やはり目立って遅い。差はconcurrentQueueの有無なので、こう、やっぱlockより速いんだなあ。
若干のAPIパラメータを調整、デバッグ用ログの追加(オン状態)
dotnet core 2.0用のラスト。
Method | Mean | Error | StdDev |
-------------------------------- |----------:|-----------:|----------:|
Take_10byte_2sock_async | 4.656 us | 6.2569 us | 0.3535 us |
Take_10byte_2sock_sync | 1.730 us | 0.9745 us | 0.0551 us |
Take_10byte_10sock_async | 4.337 us | 0.4410 us | 0.0249 us |
Take_10byte_30sock_async | 4.262 us | 0.7247 us | 0.0409 us |
Take_10byte_2sock_sync_10item | 17.134 us | 8.7683 us | 0.4954 us |
Take_10byte_10sock_async_10item | 21.383 us | 16.5207 us | 0.9335 us |
Take_10byte_30sock_async_10item | 22.547 us | 3.6320 us | 0.2052 us |
Take_10byte_2sock_pipeline | 6.301 us | 1.8376 us | 0.1038 us |
Take_10byte_10sock_pipeline | 5.949 us | 2.8413 us | 0.1605 us |
Take_10byte_30sock_pipeline | 6.673 us | 4.2911 us | 0.2425 us |
Take_10byte_2sock_pipeline_10 | 29.744 us | 20.4818 us | 1.1573 us |
Take_10byte_10sock_pipeline_10 | 25.569 us | 10.5866 us | 0.5982 us |
Take_10byte_30sock_pipeline_10 | 23.797 us | 3.4032 us | 0.1923 us |
Take_10byte_2sock_loop_2 | 35.643 us | 21.8902 us | 1.2368 us |
Take_10byte_10sock_loop_2 | 35.144 us | 16.8500 us | 0.9521 us |
Take_10byte_30sock_loop_2 | 35.683 us | 12.4644 us | 0.7043 us |
3.5版と比べてだいたい倍くらい高速。